Abstract:
Web page classification is the process of classifying documents into predefined categories based on their content. The task of data mining is to automatically classify documents into predefined classes based on their content. Many algorithms have been developed to deal with automatic text classification. The most common techniques used for this purpose work include Apriori Algorithm and implementation of Naive Bayes Classifier. Apriori Algorithm finds interesting association or correlation relationships among a large set of data items. The discovery of these relationships among huge amounts of transaction records can help in many decision making process. The Naive Bayes Classifier uses the maximum a posterior estimation for learning a classifier. Then, use Naive Bayes Classifier to calculate probability of keywords among a large data itemsets. Moreover, this technique is efficient for web page classification. The technique will be more effective is the training set is set in such a way that it generates more sets. Though the experimental results are quite encouraging, it would better if the work with larger data sets with more classes.
Keywords: Association Rule, Apriori Algorithm, Naïve Bayes Classifier